3
100
14'233
56'003
8'771
The whole text has been pre-processed, in particular:
Lemmatisation
Lemmatisation is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word’s lemma, or dictionary form. For example:
StopWords removal
Please refers to this site for mission and guidelines.
In information retrieval, TF-IDF, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection.
The tf–idf value increases proportionally to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word, which helps to adjust for the fact that some words appear more frequently in general.
TFIDF helps us see how word choice changes for each cantica.
In Paradiso the most important words are Cristo, Ridere (to laugh), Letiza (gladness) while in Inferno the most important words are Bolgia (bedlam), Pena (punishment), Fiera (in this case means obstruction). From these words you can already guess the ambient and feeling that distinguishes the canticles